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© Distributed database management 



© In parallel database management systems, 
database update requests typically result in activity 
at several nodes. Rollback of all updates is required 
if the update of any node fails. A coordination pro- 
cess monitors for failure or success of updates. The 
coordinator further provides for distinguishing activi- 



ties that have taken place at any given node from 
the other nodes for different database update re- 
quests. Savepoints are local. This allows rollback of 
a selected update without affecting nodes which did 
not process the update. 
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The invention relates to parallel or homoge- 
neous distributed databases and more particularly 
to distributed update requests within a transaction 
on a parallel database. 

A distributed database system is one imple- < 
mented over a communication network linking a 
plurality of computers. The database itself is par- 
titioned among the individual computers. A 
database request from an application program run- 
ning on one computer can be sent to one or more n 
remote computers over the network. To handle 
such a request a coordinator running on a com- 
puter (either the computer whore the application 
program is executing or a remote computer which 
exclusively handles coordination) decides whether n 
the transaction owning the request is to be commit- 
ted or rolled back. A true distributed request is 
permitted to span multiple computers. Its operation 
is transparent to the user and to the application 
program. For example, the request may involve the 20 
virtual union of partitions on mutually remote com- 
puters. 

Throughput in a distributed database system 
should be better than in a nondistributed database 
system because requests from a transaction can 25 
be divided among appropriate remote computers or 
span partitions where more than one partition is 
appropriate. The handling of requests in parallel is 
readily handled as long as no modification of the 
database is required. A request which involves any 30 
modification of the records of the database has 
forced serialization (i.e. completion) of requests in 
order to allow for distribution of the request without 
compromising rollback of the transaction if later 
required. 55 

A distributed transaction (also called a distrib- 
uted unit of work) may include several distributed 
update requests. Requiring serialization of all oper- 
ations from the requests compromises the principle 
of executing the transaction in parallel. Throughput 40 
then falls below possible levels. 

The invention is directed to providing an im- 
proved system and method for supporting distrib- 
uted update requests within a transaction on a 
parallel database. 45 

To achieve this, the invention provides a par- 
allel database management system, comprising: a 
communications network having a plurality of com- 
puters connected thereto; a database distributed 
among partitions on the plurality of computers; 50 
characterised by 

a plurality of subordinate processes resident on 
the plurality of computers for generating local 
savepoints for the partitions, for executing a re- 
quest instance and for signalling over the network 55 
failure or success of execution, each computer 
comprising memory for storing the local savepoints 
between the requests; a coordinator resident on at 



least one of the computers for receiving requests 
from applications, for distributing over the network 
request instances and request savepoint data relat- 
ing to the request to selected ones of the subordi- 
nate processes and for issuing a rollback request 
with reference data for comparison to the local 
savepoints upon receipt of a signal indicating fail- 
ure of execution by a subordinate process; and the 
subordinate processes being arranged to respond 
to the reference data matching the local savepoints 
by executing a rollback. 

In one embodiment, responsive to initiation of a 
transaction, a request counter for a coordination 
process is initialized. Responsive to each receipt of 
a request for distribution by the coordination pro- 
cess, the request counter for the coordination pro- 
cess will be incremented. Request instances, in- 
cluding savepoint data such as the current request 
counter, are generated for each request and distrib- 
uted to selected subordinate processes on different 
nodes for the partitions of the database. Respon- 
sive to receipt of a request instance on a selected 
subordinate process, a local savepoint with the 
request savepoint data and locally generated 
savepoint data is stored for each selected subordi- 
nate process. An attempt to execute the request 
instance is then made. Success or failure of the 
execution attempt is returned to the coordination 
process for each request instance. Responsive to 
indication of failure of execution for any request 
instance, a rollback is performed on each subordi- 
nate process on which the local savepoint indicates 
execution of a request instance for the request. 

The invention will best be understood by refer- 
ence to the following detailed description of an 
embodiment thereof when read in conjunction with 
the accompanying drawings, wherein: 

Figure 1 is a block diagram of a local area 
network indicating nodes of residence of various 
supporting processes; 

Figure 2 is a high level logical flow chart of a 
transaction initialization process; 
Figure 3 is a high level logical flow chart of a 
process for a coordinator process in a distrib- 
uted database manager system; 
Figure 4 is a high level logical flow chart for a 
subordinate process relating to handling of a 
request; and 

Figure 5 is a high level logical flow chart for a 
subordinate process relating to handling of a 
rollback. 

Figure 1 is a block diagram of local area 
network 8. Description of the invention in connec- 
tion with a local area network having a token ring 
geometry is intended as illustrative only and not 
intended to imply that the invention is not ap- 
plicable in different types of distributed processing 
environments, such as wide area networks. Com- 



2 



BNSDOCID: <EP 065781 3A1J. > 



3 



EP 0 657 813 A1 



4 



puter 11 is representative of node supporting a 
partition of a database in a parallel database man- 
agement system (PDBMS). Computer 11 is a con- 
ventional machine and includes a central process- 
ing unit 60, a random access memory (RAM) mem- 
ory 64, and a direct access storage device (DASD) 

56 storing a database partition and a second DASD 

57 storing a transaction log of data changes made 
to the database partition stored in DASD 56. 

Resident in memory 64 are an operating sys- 
tem and local area network server 66. Also resident 
in memory 64 and representative of the database 
management system is a coordination process 13 
for receiving transactions from an application pro- 
gram 70 on computer 11 (or from application pro- 
grams executing on computers 12A-12C) and dis- 
tributing instances of requests contained in the 
transaction to the various partitions of the database. 
Also resident in memory 64 are subordinate pro- 
cesses 72 and 74 and a savepoint 23. 

Computer 11 includes a network adapter 62 for 
formatting outgoing transmissions and for defor- 
matting incoming transmissions over channel 10. 
Computer 11 communicates with computers 12A- 
12C, which support other partitions of the database, 
over a communications channel 10. Users access 
the shared data objects through client computers 
12A-12C. Computer 12B is typical of these ma- 
chines. Computer 12B is substantially similar to 
computer 11, and includes a DASD 90 for a 
database partition, a DASD 91 for a transaction log, 
a central processing unit (CPU) 82 and an addres- 
sable RAM memory 88. Components of computer 
12B transfer data internally over a data bus 83. 
CPU 82 directly controls input peripherals 80 which 
may include a keyboard and a mouse for user 
interaction with the database. Display adapter 84 
drives a display device 86. 

In computer 12B memory 88 may include an 
application program 92, which may generate up- 
date requests for the database, and subordinate 
process 94 for handling a local instance of a dis- 
tributed request received from the coordination pro- 
cess 13 executing on computer 11. Also present is 
an operating system and LAN server 96 for use in 
establishing a communications session on network 
8 and operating computer 12B and a local 
savepoint 25. 

A transaction can result in one or more distrib- 
uted requests, each of which may span a plurality 
of partitions of the database. A given computer 
(node) may process more than one operation for 
one distributed request. Part of the job of the 
coordination process is to wait for all of the subor- 
dinate processes to report success or failure in 
handling the distributed requests. If any subordi- 
nat process reports failure, th coordination pro- 
cess signals all involved computers to rollback 
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work done for the distributed request. 

Rollback on computer 11 is done by return to a 
savepoint 23 maintained by the transaction log. 
Each computer maintains its own savepoint in 

5 memory. A savepoint at a computer consists of the 

request counter value (my_req counter) and a 

log Sequence value (my_sp Isn) and is used to 

undo changes recorded in the transaction log. Zero 
values for the variables indicate that no request 

io and no savepoint have yet issued. 

A savepoint 23 identifies an entry in the trans- 
action log to which a rollback may be made. A 
savepoint has some similarities in this respect to a 
synchronization or commit point, which marks a 

15 partition between transactions. However, a 
savepoint 23 is distinguished from a synchroniza- 
tion point in that each savepoint is local. Establish- 
ing a savepoint 23 is not the same as performing a 
commit since updates made by the transaction ar 

20 not yet visible to other transactions and will not 
become so until the transaction for which the 
savepoint is established reaches a successful end 
of transaction. 

Upon initialization of a transaction the coordina- 

25 tion process and all subordinate processes receive 
their own copies of variables designated 

my_req counter (request counter) and 

my^sp Isn (log sequence number) for a 

savepoint. These variables are maintained in local 

30 RAM memory for the computer involved and per- 
sist between distributed requests. The request 
counter is maintained at the coordinator to distin- 
guish operations that have taken place at any given 
node for different database request. The counter is 

35 initialized to zero when a database connection is 
made (i.e. a transaction is initiated) and incre- 
mented with each receipt of a database request 
(e.g., update a set of records in a relation). The 
request counter for coordination process 13 is an 

40 included parameter in each distributed request in- 
stance. 

Figure 2 is a high level flow chart illustrating 
response of coordination process 13 and the subor- 
dinate processes 72, 74 and 94 in the database 
45 upon initialization of a transaction. For each pro- 
cess the response consists of step 101, which 
provides for setting my_req_jcounter and 
my_sp^/sn equal to zero initializing the local 
savepoint. 

50 Figure 3 illustrates how the coordination pro- 

cess handles receipt of a request for distribution 
from an application program (step 103). At step 
105 the local value of my_req__counter is incre- 
mented by one. Next, at step 107 one or more 

55 request instances are sent to subordinat pro- 
cesses. The variable my__req counter is passed 

to the subordinat processes as a param ter of th 
r quest instance. Next, step 109 represents waiting 

3 
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rLl V° b8 P0Sted by the s "bordinate pro- 
cesses and collected by the coordination process 
it any request instance fails for any request it is 
detected at step The NO branch isteken'from 
step 111 1 to step 117 where the coordination pro- 
cess notifies all subordinate processes involved in 
processing the distributed request to rollback to 
the.r savepoints. Again (my__req_counten is 
passed as a parameter of the request. The co- 
o d.nat lon process also informs the appropriate ap- 
plicabon program of the failure (step 119). The YES 
branch from step 111 advances processing to step 
113, which ,s provided to collect indication of suc- 
cess from all involved subordinate processes. Until 
all subordinate processes have indicated success 
processing is returned along the NO branch to step 

ir£ f Sr processes hav * turned a success 
.nd.cat.on, step 115 is executed to notify the ap- 
pl.cat,on program of success. The system wide 
chstnbuted request is now considered a success 
and no further action is taken for the request until 
the transaction as a whole is committed. Following 
e.ther step 115 or 119 , he process is done 

Figure 4 is a high level logical flow chart 
.""strafing handling of a distributed request in- 
stance by a representative subordinate process 
Before a d.stnbuted request instance is processed 
a new savepoint may be made, replacing a prior 
savepoint for another distributed request Instance' 
The process is entered at step 121 with a deter- 
m.nat,on ,f the local version of my_req_counter 
.s equal ,n value to parameter_req_counter, the 
coord.nat.on process version of my__req_counter 
which .s passed with the distributed request in- 

TeauL parame,er P as ^d by a distributed 

request .nstance is the same as that in the node's 
current savepoint, it means that the node has al- 
ready started processing on behalf of the distrib- 
uted request. The YES branch is taken from step 
121 to leave the savepoint unchanged. If the two 
versions of the parameter disagree in va.ue, Z 
local version my_req_counter is set equal to the 
vers.on passed with the request instance by execu- 
.on of step 123 and my_req_lsn is set equal to 
the current node log sequence number. Once the 
savepoint is verified (the YES branch from step 121 
or after step 123), the distributed request instance 
may be processed (e.g., update of a record in a 

StTZ \ * Partiti ° n) - The update ™y s "cceed 
or fan. At step 127 success or failure is indicated to 

°" Pr0CeSS ' After com P'<*ng a dis- 
puted request instance, the savepoint persists in 

ZT t 0ry '. save P° in t remains available until no- 
.fica .on from the coordination process of the need 
to rollback to the savepoint, a new distributed re- 
quest .nstance is received at the node, or the 
transaction is committed. 



mJ5 ' S 3 9h ' evel logical flow chart of a 
rollback process executed by a subordinate pro- 
cess upon receipt of a rollback request from a 
coord,nat.on process. At each subordinate process, 
s the request counter value passed with the rollback 
request is compared with the value of the local 
request counter. ,f the values disagree, the current 
savepomt for the subordinate process is for a d£- 
ferent d.stributed request. No database changes 

Z* f h ee, \ made relati "9 *» ^ distributed request 
and, therefore, nothing needs to be rolled back 
The rollback request is ignored. In the flow chart 
th.s operation is represented by step 131, which 
compares the ,ocal version of n,y_re q __count 

I , V9rS ' 0n PaSSed as 3 of the 

rollback request. Upon disagreement of the values 
the process is presently exited. If the two versions 
of the request counter are in agreement, step 133 
-s executed to determine if the rollback has already 
yTsh 'J the process is exited along the 
YES branch from step 133 If the rollback has not 

tZT\ b ! 6n d ° ne ' Step 135 is ^uted to roll- 
back all changes for the current transaction (recov- 
ered from the transaction log) since my so tsn 
25 was last recorded. The process is then complete 
In parallel database management systems 
database update requests typically result in activity 
at several nodes. Rollback of all updates is re- 
qu.red if the update of any node fails. A coordina- 

oL P ;°th S m0n, ' t0rS f ° r fai ' Ure or suc ^s of 
updates. The coordinator further provides for distin- 
gu.sh.ng activities that have taken place at anv 
given node from the other nodes for different 
database update requests. Savepoints aretca" 
T ^ a llows rollback of a selected update without 
affecfing nodes which did not process the update 

a ||p l ^ e H nVenti ? n Pr0Vid6S f0r irn P'°vement in par- 
allel handl.ng of distributed requests. When a dis- 
tnbuted request is successful on all nodes there is 
« now no need for distribution of messages from the 
coordination process to remove savepoints. Further 
only one savepoint per node is required, which 
simplifies the rollback operation. 

4s nm ^ 6 inVenti ° n Can be em codied as a data 
<s processing program product for assuring atomicitv 
of d.stnbuted update requests on a p ar 2 
database management system, comprising media 
stonng: .nstruction means for distributing requS 
.nstances for a request with request savepoint data 
so .nstruct.on means responsive to receipt of a re- 
quest .nstance on a selected subordinate process 
for updating a local savepoint with the request 
savepoint data and locally generated savepoint 
data; mstrucfion means for executing the request 
- .nstance for each selected subordinate p'cesJ 
-nstruction means for indicating to a coordination 
process success or failure of execution on each 
request instance; and instruction means responsive 
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to indication of failure of execution for any request 
instance of the request for performing a rollback on 
each subordinate process which the local savepoint 
indicates executed a request instance for the re- 
quest. 

The media can further comprise: instruction 
means responsive to initiation of a transaction for 
initializing a request counter for the coordination 
process; instruction means responsive to receipt of 
a request from an application program by the co- 
ordination process for incrementing the request 
counter for the coordination process; and the in- 
struction means for distributing can include instruc- 
tion means for passing the request counter as 
request savepoint data. 

As described, locally generated savepoint data 
can include a log sequence number tracking in- 
stances of changes to a partition. 

The media can also include: instruction means 
responsive to initiation of a transaction for initializ- 
ing the local savepoint for all of a plurality of 
distributed processes by setting the local request 
counter and the log sequence number to value 
indicating no prior issuance of request savepoint 
data. 

The instruction means for updating of a local 
savepoint can further include instruction means for 
adding a current value for the log sequence num- 
ber to the savepoint. 

The media can further comprise: instruction 
means responsive to issuance of a request in- 
stance for discarding any prior savepoint for the 
selected subordinate processes; instruction means 
responsive to rolling back to a savepoint on a 
subordinate process for discarding the savepoint; 
and instruction means responsive to conclusion of 
the transaction for discarding savepoints across all 
subordinate processes. 

Claims 



of the computers for receiving requests from 
applications (70,92), for distributing over the 
network request instances and request 
savepoint data relating to the request to se- 

s lected ones of the subordinate processes and 

for issuing a rollback request with reference 
data for comparison to the local savepoints 
upon receipt of a signal indicating failure of 
execution by a subordinate process; and 

w the subordinate processes being arranged 

to respond to the reference data matching the 
local savepoints by executing a rollback. 

2. A parallel database management system as 
75 claimed in claim 1 wherever each computer 

comprises 

a log sequence number generator for each 
computer for tracking all changes to the parti- 
tion during a transaction; and 
20 means responsive to receipt of a request 

instance for placing a log sequence number 
into the local savepoint. 

3. A parallel database management system as 
25 claimed in claim 1 or claim 2, wherein the 

request savepoint data includes a value for the 
request count for the coordinator. 

4. A parallel database management system as 
30 claimed in any preceding claim, the coordina- 
tor further including a process responsive to 
entry to a transaction for initiating a request 
counter for the coordinator. 

35 -5. A parallel database management system as 
claimed in any preceding claim, in which each 
subordinate process further includes a process 
responsive to entry to a transaction for initiat- 
ing a request counter for the subordinate pro- 

40 cess and for initiating a log sequence count. 



1. A parallel database management system, com- 
prising: 

a communications network having a plural- 
ity of computers (1 1,12A,12B,12C) connected 45 
thereto; 

a database distributed among partitions 
(56,90) on the plurality of computers; charac- 
terised by 

a plurality of subordinate processes so 
(72,74,94) resident on the plurality of comput- 
ers for generating local savepoints (23,25) for 
the partitions, for executing a request instance 
and for signalling over the network failure or 
success of execution each computer cornpris- 55 
ing memory for storing the local savepoints 
(23,25) between the requests; 

a coordinator (13) resident on at least one 



6. A parallel database management system as 
claimed in any preceding claim, and further 
comprising: 

means responsive to initiation of a transac- 
tion for initializing the local savepoint for all of 
a plurality of distributed processes by setting a 
local request counter and a log sequence num- 
ber to value indicating no prior issuance of 
request savepoint data. 

7. A computer for use in parallel database man- 
agement system as claimed in any preceding 
claim, comprising: 

means to connect to the network - receiv- 
ing from the network request instanc s for a 
requ st with r quest savepoint data; 

means responsive to receipt of a request 
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instance on one or more selected subordinate 
processes in the computer for updating a local 
savepoint with the request savepoint data and 
locally generated savepoint data; 

means for executing the request instance 
for each selected subordinate process; 

means for indicating to a coordination pro- 
cess success or failure of execution on each 
request instance; and 

means responsive to indication of failure of 
execution for any request instance of the re- 
quest for performing a rollback on each subor- 
dinate process which the local savepoint in- 
dicates executed a request instance for the 
request. 

A computer as claimed in claim 7 further com- 
prising: 

a coordination process for distributing re- 
quest instances for a request with request 
savepoint data; 

means responsive to initiation of a transac- 
tion for initializing a request counter for the 
coordination process; 

means responsive to receipt of a request 
from an application program by the coordina- 
tion process for incrementing the request 
counter for the coordination process; and 

the means for distributing including means 
for passing the request counter as request 
savepoint data. 



9. A computer as claimed in claim 7 or claim 8, 
and further comprising: 

means responsive to issuance of a request 
instance for discarding any prior savepoint for 
the or each selected subordinate process; 

means responsive to rolling back to a 
savepoint on a subordinate process for dis- 
carding to savepoint; and 

means responsive to conclusion of the 
transaction for discarding savepoints across all 
subordinate processes. 



10 



15 



20 



25 



30 



10. A method for managing distributed requests in 
a parallel database management system com- 
prising: 

distributing request instances for a request 
with request savepoint data; 

responsive to receipt of a request instance 
on a selected subordinate process, updating a 
local savepoint with the request savepoint data 
and locally generated savepoint data; 

executing the request instance for each 
selected subordinate process; 

indicating to a coordination process suc- 
cess or failure of execution on each request 
instance; and 



35 



40 



45 



50 



55 



responsive to indication of failure of execu- 
tion for any request instance of the request 
performing a rollback on each subordinate pro- 
cess which the local savepoint indicates ex- 
ecuted a request instance for the request 
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ENTER ON 
TRANSACTION INIT. 
FOR COORDINATOR 
OR SUBORDINATE 
PROCESS 



Fig. 2 
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SET 

UY.REQ.COUNTER 
= 8 AND SET 
MY.SP.LSN = B 
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END 



r 
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ENTER ON 
SUBORDINATE 
PROCESS ON RECEIPT 
OF REQUEST 




SET MY_REQ_CNTR 
= PARA.REQ.CNTR 
AND SET MY. SP LSN 
TO CURRENT 
NODE LSN 



HANDLE REQUEST 
( E.G. UPDATE 
A ROW IN THE 
PARTITION ) 



I 



■ f\ 2 5 



RETURN SUCCESS 
OR FAILURE 
INDICATION TO 
COORDINATOR 



c 
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Fig. 4 



DONE 
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ENTER ON 
SUBORDINATE PROCESS 
TO HANDLE ROLL BACK 
TO SAY EPOINT REQUEST 
FROM COORDINATOR 




< MY.REQ.CNTR = JO 
PARA.REQ.CNTR 




. HAS ROLL BACK 
< ALREADY BEEN 
DONE FOR NODE ? 




ROLL BACK ALL 
CHANGES MADE FOR 
THIS TRANSACTION, 
SINCE MY.SP.LSN 
WAS RECORDED 



Fig. 5 
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MY.REQ. COUNTER 
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TO ALL SUBORDINATE 
PROCESSES. PASS 
UY.REQ. COUNTER 
AS PARAMETER 
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